Transforming word vectors


In the previous week, I showed you how we can plot word vectors. Now, you will see how you can take a word vector and learn a mapping that will allow you to translate words by learning a "transformation matrix". Here is a visualization:

Note that the word "chat" in french means cat. You can learn that by taking the vector corresponding to "cat" in english, multiplying it by a matrix that you learn and then you can use cosine similarity between the output and all the french vectors. You should see that the closest result is the vector which corresponds to "chat".

Here is a visualization of that showing you the aligned vectors:

Note that XX corresponds to the matrix of english word vectors and YY corresponds to the matrix of french word vectors. RR is the mapping matrix.

Steps required to learn RR:

  • Initialize R

  • For loop

Loss=XRYF Loss = \| XR-Y \|_F

g=ddRLoss g = \frac{d}{dR} Loss

R=Rαg R = R- \alpha*g

Here is an example to show you how the frobenius norm works.

XRYFA=(2222)AF=22+22+22+22AF=4AFi=1mj=1naij2

Extra \left or missing \right
\right) \\ \left\|\mathbf{A}_{F}\right\|=\sqrt{2^{2}+2^{2}+2^{2}+2^{2}} \\ \left\|\mathbf{A}_{F}\right\|=4 \\ \|\mathbf{A}\|_{F} \equiv \sqrt{\sum_{i=1}^{m} \sum_{j=1}^{n}\left|a_{i j}\right|^{2}}\end{array}

In summary you are making use of the following:

XRY minimize XRYF2

XRY minimize XRY2F

 Complete